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Abstract 

We revisit the derivation of expurgated error exponents using a method of type class enumer- 
ation, which is inspired by statistical-mechanical methods, and which has already been used in 
the derivation of random coding exponents in several other scenarios. We compare our version of 
the expurgated bound to both the one by Gallager and the one by Csiszar, Korner and Marton 
(CKM). For expurgated ensembles of fixed composition codes over finite alphabets, our basic 
expurgated bound coincides with the CKM expurgated bound, which is in general tighter than 
Gallager's bound, but with equality for the optimum type class of codewords. Our method, 
however, extends beyond fixed composition codes and beyond finite alphabets, where it is nat- 
ural to impose input constraints (e.g., power limitation). In such cases, the CKM expurgated 
bound may not apply directly, and our bound is in general tighter than Gallager's bound. In 
addition, while both the CKM and the Gallager expurgated bounds are based on Bhattacharyya 
bound for bounding the pairwise error probabilities, our bound allows the more general Chernoff 
distance measure, thus giving rise to additional improvement using the Chernoff parameter as 
a degree of freedom to be optimized. 

Index Terms: Expurgated exponents, expurgated ensembles, Bhattacharyya distance, Cher- 
noff distance, random energy model. 
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1 Introduction 



It is well known that the random coding exponent on the probability of error in channel coding 
can be improved, at low coding rates, by a process called expurgation, that results in the so called 
expurgated exponent, or the expurgated bound, which is a lower bound to the reliability function. The 
idea of expurgation, first introduced by Gallager [8, Section V], [9, Section 5.7] (see also [24, Section 
3.3]), is that at low rates, the average error probability over the ensemble of codes, is dominated 
by bad randomly chosen codewords and not by the channel noise, therefore, by eliminating some 
of these codewords (while keeping the rate almost the same), an improved lower bound on the 
reliability function is obtained. The expurgated bound at zero rate is known to be tight, as it 
coincides, at this point, with the straight-line bound, which is an an upper bound on the reliability 
function [9, Section 5.8], [20], [21], [24, Sections 3.7, 3.8]. Omura [19] was the first to relate the 
expurgated exponent at low rates to distortion-rate functions, where the Bhattacharyya distance 
function plays the role of a distortion measure. 

Several years later, Csiszar, Korner and Marton [3] derived, for finite alphabets, a different 
expurgated bound, henceforth referred to as the CKM expurgated exponent, as opposed to the 
Gallager expurgated exponent discussed above. While ref. [3] contains no details (it is an abstract 
only), the CKM expurgated exponent is mentioned in [1, eq. (7)] and some hints on its derivation 
can be found in [2, p. 185, Problem 17]. While the CKM expurgated exponent is equivalent to 
that of Gallager for the optimum channel input assignment [2, p. 193, Problem 23(b)], it turns 
out (as we will be shown below) that for a general input distribution, the CKM expurgated bound 
is larger (and hence tighter) than the Gallager expurgated bound. This is important whenever 
channel input constraints (e.g., power limitation) do not allow this optimum input distribution to 
be used. On the other hand, since the derivation [2, pp. 185-186, Problem 17 (hint)] of the CKM 
expurgated exponent relies strongly on the packing lemma [2, p. 162, Lemma 5.1], it is limited to 
finite input and output alphabets (as mentioned) and to fixed composition codes, as opposed to 
the Gallager expurgated exponent, whose derivation is carried out under more general conditions. 

In this paper, our quest is to enjoy the best of both worlds: We use yet another analysis 
technique, which has already been used in several previous works in different scenarios [7], [10], [13], 
[14], [15, Chapters 6,7], [22], [23], where it has always yielded simplified and/or improved bounds 
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on error exponents. This technique, which is based on distance enumeration, or more generally, 
on type class enumeration, is inspired by the statistical-mechanical perspective on random coding, 
based on its analogy to the random energy model [18, Chapters 5, 6], which is a model of spin 
glasses with a high degree of disorder, invented by Derrida [4], [5], [6], and which is well known in 
the literature of statistical physics of magnetic materials. Our technique is applicable to channels 
with quite general input/output alphabets, it is not limited to fixed composition codes, and it 
allows the incorporation of channel input constraints, which are, of course, especially relevant when 
the channel input alphabet is continuous. In the special case of finite alphabets, our basic bound 
coincides with the CKM expurgated bound along the whole interesting range of rates, and hence 
is tighter, in general, than Gallager's expurgated exponent. 

Furthermore, an additional improvement of our expurgated bound is obtained by observing 
that, instead of using the Bhattacharyya bound for the pairwise error probabilities (as is done in 
the derivations of both the Gallager- and the CKM expurgated exponents), it turns out that for 
our proposed form of the expurgated exponent, the pairwise error probabilities can more generally 
be bounded using the Chernoff distance measure, whose parameter is subjected to optimization. 1 

Finally, as mentioned above, our analysis technique is based on a statistical-mechanical point 
of view. This point of view naturally suggests a physical interpretation to the behavior of the 
expurgated exponent in the following sense: Similarly as in Gallager's and the CKM expurgated 
exponents, the graph of the new proposed expurgated exponent is curvy at low rates and becomes 
a straight line of slope —1 at the higher range of rates. It turns out that this passage from a curve 
to a straight line can be understood as a phase transition in the analogous statistical-mechanical 
system model - the random energy model. This point will be discussed as well. 

The outline of the remaining part of this paper is as follows. In Section 2, we provide some 

background on the expurgated exponents of Gallager and Csiszar, Korner and Marton, as well 

as the relationship between them. In Section 3, we provide a few elementary observations that 

serve as a basis for our proposed derivation of the expurgated exponent. In Section 4, we present 

the derivation of the new proposed version of our expurgated error exponent for finite alphabets 

1 While Gallager's bound has a symmetry that guarantees that the optimum value of the Chernoff parameter is 
always 1/2 (in which case, the Chernoff distance coincides with the Bhattacharyya distance), this symmetry does not 
appear in the new proposed bound, and hence the optimum value of the Chernoff parameter is not necessarily 1/2. 
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and fixed composition codes. In Section 5, we outline the extension of this analysis to continuous 
alphabet channels. Finally, in Section 6, we discuss the statistical-mechanical perspective of our 
analysis. 



2 Background 

Consider a discrete memoryless channel (DMC), defined by the single-letter transition probability 
functions 2 P = {p(y\x), x £ X, y £ y}, where X and y are the input alphabet and the output 
alphabet, respectively. Let Q = {q(x), x £ X} be a probability function on the input alphabet X. 

Gallager's random coding error exponent function is a well known lower bound on the reliability 
function of the DMC [8], [9, Section 5.6], [24, Section 3.2]. It is given by 

E r (R)= sup sap[Eo(p,Q) - pR] (1) 
0<p<l Q 



where 



E (p,Q) 




i( x )p(y\ 



X 



,l/(l+p) 



(2) 



Ixex 

and where here and throughout the sequel, it is understood that for continuous alphabets, sum- 
mations are replaced by integrals. This bound is obtained by analyzing the exponential rate of the 
average error probability associated with a randomly chosen code C n = {x±, . . . , xm}, M = e nR , R 
being the coding rate and x m 6 X n being the codeword associated message number m 6 {1, . . . , M}, 
where each component of each codeword is selected independently at random under Q. 

At low rates, this lower bound on the reliability function can be improved by expurgating 
the randomly chosen code. This expurgation is accomplished by discarding the 'bad' half of the 
codebook, namely, the half of codewords whose conditional error probabilities 

P e \ m = Prjerror | message m sent} 

are the largest under maximum likelihood (ML) decoding. Gallager's expurgated exponent function 
[8], [9, Section 5.7] [24, Section 3.3] is given by 

E ex (R) = supsup[^(p, Q) - P R] (3) 
p>l Q 



2 Here and throughout the sequel, "probability function" is a common name for a probability mass function in the 
discrete alphabet case and a probability density function in the continuous alphabet case. 
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where 



E x (p,Q) = -p\il\ q(b)q&) 



y p(y\ x )p(y\ x ') 

yey 



i/ P \ 



(4) 



Improvement over E r (R) is accomplished whenever the coding rate R is small enough such that 
the supremum in eq. (3) is achieved (or approached) by values of p that are strictly larger than 1, 
as otherwise for p = 1, we have E X (1,Q) = Eq(1,Q). 

In [3] (see also [2, p. 185, Problem 17] for details), the following version of the expurgated 
exponent was presented by Csiszar, Korner and Marton (CKM) for channels with finite input and 
output alphabets: 



E ex (R) = sup inf [I(X; X') + Ed B (X, X')] - R, 
Q Q xx ,eA(R,Q) 



(5) 



where Qxx' is a generic joint probability mass function over X 2 , that governs both the mutual 
information and the expectation in the square brackets of eq. (5) , 

A(R,Q) = {Qxx' ■ Qx = Qx> = Q, I(X;X') < R}, 
and c/b(-, ■) is the Bhattacharyya distance function, defined by 



ds{x, x') = — In 



V p(y\ x My\ x ') 

yey 



(6) 



In [2, p. 193, Problem 23b] it is asserted that the right-hand sides of eqs. (3) and (5) are equivalent, 
thus justifying the common notation E ex (R) for both expressions. Hereafter, to avoid confusion 
between the Gallager and the CKM expurgated exponents, we will deviate from the customary 
notation used above, and re-define the notation Eq(p,Q) for E x (p,Q) (where the subscript G 
stands for "Gallager"), and accordingly 



£ G (R,Q) = suv[E G (p,Q)-pR], 
thus, E ex (R) = supq£g(R,Q)- Similarly, we will denote 

Sckm{R, Q) = mf [I(X- X') + Ed B (X, X')) - R, 

Q XX ,£A(R,Q) 



(7) 



(8) 



thus, E ex (R) = supg £ckm(R, Q)- 
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While supg Sq(R, Q) = swpq£ckm(R,Q) as mentioned above, it turns out that for a general 
choice of Q, the functions £g{R, Q) and £ckm(R, Q) may differ. In fact, as we shall see shortly 

£ckm(R,Q)>£g{R,Q) (9) 

for an arbitrary input assignment Q. This is an important point since the optimum input assignment 
Q*, that achieves E ex (R), might be forbidden in the presence of channel input constraints (e.g., 
power limitation), and so, in such a case, the CKM expurgated exponent may be better than 
the Gallager expurgated exponent. On the other hand, there are two advantages to the Gallager 
expurgated exponent relative to the CKM expurgated exponent. The first is that, unlike the case 
of the CKM bound, its derivation is not sensitive to the assumption of finite alphabets and fixed 
composition codes. 3 The second advantage is that the numerical calculation of £q{R,Q) requires 
optimization over one parameter only (the parameter p), whereas the calculation of £ckm(R,Q) 
seems (at least in its present form) to require optimization over the entire joint distribution Qxx' 
(which means many parameters for a large input alphabet) and moreover, this optimization is 
subjected to complicated constraints (defined by A(R, Q)). 

3 Some Preliminary Observations 

Before presenting the proposed alternative derivation of our expurgated exponent, we pause to offer 
a few preliminary observations that would hopefully help to compare £g(R,Q) and £ckm{R,Q) 
and to understand the relationships between them, as well as their relation to that of the new 
bound to be derived. In particular, our first task is to transform the expression of £ckm{R,Q) to 
a form that has the same ingredients as those of £g(R, Q)- 

We first define the function 

D Q {R)= min E{d B (X;X')}. (10) 

Q XX ,€A(R,Q) 

Intuitively, the function Dq(R) is the distortion-rate function of a "source" Q (designated by 

the random variable X) with respect to (w.r.t.) the Bhattacharyya distortion measure g?b(t)j 

subject to the additional constraint that the "reproduction variable" X' has the same probability 

3 In fact, in the case of a continuous input alphabet, the notion of fixed composition codes does not really exist 
altogether. 



6 



distribution Q as the "source." It is easy to see now that 

inf [I(X;X') + Ed B {X,X')\ = 
Q XX ,£A(R,Q) 



D Q (R) + R R<Ri 
D Q (R 1 ) + R 1 R>R x 



(11) 



where Ri is I(X;X') for the optimum Qxx' that minimizes [I(X;X') + EcIb(X,X')] across 
A(oo,Q), or equivalently, R\ is the rate R at which D'q{R) = —1, D'q{R) being the derivative 
of Dq(R) w.r.t. R. Thus, we obtain 



D Q (R) R<Ri 
£ckm{R,Q) = { D Q {R l ) + R 1 -R R x < R < DqiRx) + R 1 
R>D Q (R 1 ) + R 1 



(12) 



where we note that the first line is intimately related to [2, p. 194, Problem 24]. We observe then 
that at low rates, £ckm(R, Q) has a curvy part given by Dq(R), and for high rates it is given by 
the straight line of slope —1 that is tangential to the curve Dq(R). 

Let us now take a closer look at the distortion-rate function Dq(R), which is the inverse of 
the rate-distortion function Rq(D), defined similarly, and again with the additional constraint 
Qx' = Q- This rate-distortion function has the following parametric representation [17, eq. (13)]: 



R Q (D) = - inf 

vv ' s>0 



sD+ ]T q{x)\n ]T q(x')e~ sd ^ x ' x "> 



(13) 



where the minimizing s is interpreted as the negative local slope of the function Rq(D), i.e., 
s* = —R'q(D), s* being the minimizer of the r.h.s. This function can easily be inverted, similarly 
as in [16, eqs. (15)-(20)], to obtain 

1 



Dq(R) 



inf 

s>0 s 



xex 



yx'&X 



sup 

p>0 



-p 12 q(x)]n J2 i( x ') e 



-d B (x,x')/p 



pR 



xex 



yx'&X 



(14) 



(15) 



where the second line follows from the first simply by changing the variable s to the variable 
p = 1/s. Thus, the maximizing p is the negative local slope of the function Dq(R). It follows that 
in the curvy part of £ckm(R, Q), where the slope of Dq(R) is smaller than —1, the maximizing p 
is larger than 1. Thus, the maximization in the last expression of Dq(R) can be confined to the 
range [l,oo), i.e., for R < R\ 



£ckm(R,Q) 



sup 



-p l( x ) ln 12 q(x')e- dB( - x ' x '^ p - pR 



xex 



K x'ex 
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= sup I -p Y ln Y q(x') 

P^ 1 I xex \x'ex 



Y y p(v\ x )p(y\ x ') 



1/pN 



pR 



• (16) 



and of course, for R G + Dq(R\)\ we use the same expression, setting p = 1. This should 

now be compared with Gallager's expression 



£g(R, Q) = sup { -pin 



Y q( x )q( xf ) 



Y V 'p(y\x)p(y\x' 
yey 



i/ P \ 



pR 



> . 



(17) 



As can be seen, the only difference between the two expressions is that in £ckm(R,Q), the av- 
eraging over x is external to the logarithmic function, whereas in £g{R,Q) it is internal to the 
logarithmic function. Thus, Jensen's inequality guarantees that £ckm(R,Q) > £g(R,Q)i an d 
since the logarithmic function is strictly concave, the inequality is strict for every finite p (which 
means R > 0), unless J2 x 'ex q(x')e~ dB ( x ' x '^ p happens to be independent of x, which is the case 
when either Q and P exhibit enough symmetry, or when Q is chosen to be the optimum distribution 
[2, p. 193, Problem 23b, hint (hi)]. 

Our second preliminary observation is the following. The derivation of Gallager's expurgated 
exponent begins from the union bound on the pairwise error probabilities, which in turn are all 
upper bounded by the Bhattacharyya bound, i.e., eq. (5.7.3) in [9] reads 



Pe\m < Y Yl V P(y\ X m)p(y\ 
ra'/m V 



(18) 



where y G y n designates the channel output vector. One might suspect that a better result can 
probably be obtained by considering, more generally, the Chernoff bound 



Pe\m< Y Y.P S (y\ X ™')P l S (v\ X m), 0<S<1, 
m'/m y 



(19) 



where the Chernoff parameter s is subjected to optimization (in addition to the parameter p). After 
carrying out the derivation similarly as in [9, Section 5.7], one would obtain a similar expression as 
in £g{R,Q), except that the Bhattacharyya distance function is replaced, more generally, by the 
Chernoff distance function 



d s (x,x') 



In 



^p 1 s (y\x)p s 
y&y 



(20) 
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Thus, Eg(p, Q) would be replaced by 

/ 

E G (p,s,Q) = -pin 



E q( x )q( x ') 

x,x'(zX 



E p^ivWivW) 



1/pN 



(21) 



and the best choice of s would be the one that maximizes Eo(p,s,Q). However, it is easy to 
see that Ec(p,s,Q) is concave in s and that Eg(p,s,Q) = Eq(p,1 — s,Q) since x and x' play 
symmetric roles in the expression of Eq(p,s,Q). Thus, the maximizing s is obviously s* = 1/2, 
which brings us back to the Bhattacharyya distance, and confirming that there is nothing to gain 
from the optimization over s beyond Gallager's expurgated bound. 

This is not the case, however, when it comes to the CKM expurgated bound. In particular, 
Csiszar and Korner also begin from the union-Bhattacharyya bound (see [2, p. 186, top]), and an 
extension of their derivation would yield the same expression as (16), but again, with the Bhat- 
tacharyya distance <Ib(x,x') (or di/ 2 (x, x')) being replaced by the more general Chernoff distance 
d s (x, x'). However, here x and x' do not have symmetric roles and hence the bound is not necessar- 
ily optimized at s = 1/2. Indeed, it is easy to study a simple example of a binary non-symmetric 
channel and see that the derivative of the function 



E(p, s, Q) = - P J2 q(x) In Y l( x ' 
x&x \x'ex 



Ep 1 s {y\x)p s {y\x) 
y&y 



1/pN 



(22) 



with respect to s does not vanish at s = 1/2 unless Q is symmetric (see also Example 1 below, at 
the end of this section) . 

To summarize, we observe that the CKM expurgated bound is not only better, in general, than 
the Gallager expurgated bound, but moreover, it provides even further room for improvement in 
the optimization over s, in addition to the optimization over p. Confining the framework to finite 
alphabets and fixed composition codes, this gives rise to the following coding theorem. 



Theorem 1 For an arbitrary DMC, there exist a sequence of codes {C n } n >i of rate R and com- 
position Q, 4 for which the error exponent associated with the maximum error probability is at least 
as large as 

£(R, Q) = sup sup [E(p, s, Q) - pR] (23) 

p>l 0<s<l 



4 A sequence of codes with composition Q means a sequence of fixed composition codes, where the common 
empirical distribution of all codewords tends to Q as n — > oo. 
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where E(p,s,Q) is defined as in eq. (22). 

Example 1 - binary input, binary output channels. We have compared numerically the three 
expurgated exponents for various combinations of P and Q associated with binary input, bi- 
nary output channels. As a representative example, we have computed Eq(1,Q), £7(1,1/2, Q) 
and maxo<s<i £7(1, s, Q), for the binary channel P defined by p(0|0) = p(l|0) = 0.5, p(0|l) = 
1 — p(l| 1) = 10 -10 , along with the input assignment Q given by q(l) = 1 — q(0) = 0.1. The results 
are E G (1, Q) = 0.0542, £7(1, 1/2, Q) = 0.0574, and max < s <i £7(1, s, Q) = 0.0596, which is achieved 
at s* 0.76. This means that in the range of high rates, we have 

S G (R,Q) = 0.0542 -R (24) 
£ckm{R,Q) = 0.0574 -R (25) 
£(R, Q) = 0.0596 - R. (26) 

Thus, numerical evidence indeed supports the fact that there are gaps between the three expurgated 
exponents, at least for some combinations of channels and input assignments. 

4 New Derivation of the Expurgated Exponent 

Equipped with the background of Section 2 and the observations offered in Section 3, we next 
proceed to the derivation of the new version of the expurgated bound (i.e., prove Theorem 1), but 
in a manner that does not rely on the packing lemma and hence is not sensitive to the assumptions 
of fixed composition codes and finite alphabets. We will assume finite alphabets only for the 
simplicity of the exposition and for the sake convenience, but it should be understood that our 
analysis has a natural extension to continuous alphabets (along with channel input constraints), 
and we will outline this extension in Section 5. 

Following the discussion in Section 3, we begin with the following upper bound on the conditional 
probability of error 

P e\m< E T^P^y^P^iV^m), 0<S<1. (27) 

Now, following the same rationale as in [9, Section 5.7] and [24, Section 3.3], we argue the following: 
There exists a codebook C n = {x\, . . . , xm} of M = e nR codewords such that for every p > and 
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all 1 < m < M 



Pe\m — 



2P 



e\m 



< 2 P 



2f>A n (R,p), (28) 



where the expectation operator is taken w.r.t. the randomness of the codewords {X m }, which are 
selected independently at random according to the uniform distribution over the type class Tq, 
that is, the set of all sequences whose empirical distribution is (as close as possible to) Q. 

For the purpose of further bounding A n (R,p), the next step in both [9] and [24] is to use the 
inequality [J2 m ' a m'] 1 ^ p < Em' a m' P ' which holds for every p > 1, and then to apply the expectation 
operator on each term of the corresponding sum separately. This is a step which simplifies the 
derivation to a large extent, but at the possible price of losing exponential tightness of the resulting 
bound. Instead, in our derivation, we will use another approach, which yields an exponentially 
tight bound. Defining 



,{x,x') 



In 



we have, due to the memorylessness of the channel, 



^2p 1 ~ S (y\ x rn)p S {y\x m >) = e J2 t =i d s( x ™.i> x m>,i) 4 e -d s (X m ,X m ,) ^ 

y 



(29) 



(30) 



where x m ^ is the i-th component of the codeword x m . Let N m (Qxx') be the number of codewords 
{x m i} that, together with x m , fall in the joint type class corresponding to the joint empirical dis- 
tribution Qxx'-, whose both marginals must agree with Q (as they are both empirical distributions 
of codewords). Then, we have 



A n (R, p) 



E\ E N m (Q xx ,)exp{-nEd a (X,X')} 



yp 



yp 



E max N m (Q XX >) exp{-nEd s (X, X')} 
\Qxx> J 

E max [iV m (Qxx')] 1/p exp{-nEd s (X, X')/p} 

Qxx' 

E E [N m (Qxx')] 1/p exp{-nEd 8 (X,X')/p} 

Qxx' 
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]T E {[N m (Q xx ,)] 1/p } ■ exp{-nEd s (X,X')/p} 

Qxx' 

max E {{N m (Q X x')] 1/p } " vv{-nEd 8 {X,X')/p} 
Qxx 1 

= max (E{[N m {Qxx>)] 1,p }) P -ew{-nEd 8 (X,X')}, (31) 

where the notation = designates equivalence in the exponential scale (i.e., a n = b n means that 
^ In I 21 —7- as n —7- oo), and where the expectation at the exponent is w.r.t. Qxx 1 - Now, similarly 
as in [12, p. 4444, eq. (34)], we have 

' exp{n[R- I(X;X')]} R<I(X;X') 



E 



i\N (C) \V-Ip\-) expjn^K- i(A; A'JJj tt<JA;A' 
{[N m {Qxx>)\ IP ) - eMn[R _ I{X . X > )]/p} R > I{X -X>) (32) 



where I(X;X') is the mutual information between X and X' associated with Qxx 1 - This result 
follows from the fact that given X m = x m , N m (Qxx>) is the sum of e nR — 1 binary independent 
random-variables, 

U m > = l{(x mi X m >) have empirical joint distribution Qxx'}, m' / m, (33) 

whose expectations are all of the exponential order of e~ nI( - X ' X '\ Upon taking into account all the 
possible empirical distributions {Qxx 1 }, we readily obtain 

A n (R, p) = e -nmin{E 1 (R),E 2 (R I p)} ) (34) 

where 

E 1 (R,p)= min [Ed,(X, X') + pI(X; X')\ - pR (35) 

Q x , |x : I(X;X')>R 

and 

E 2 {R)= min [Sd a (X, X') + I{X; X')\ - R = sup[E(p, s, Q) - pR], (36) 

Qx'\x : I(X;X')<R p>\ 

where the second equality is obtained similarly as in the derivation of eq. (16), but with the 

Bhattacharyya distortion measure being replaced by d s (-,-). It remains to show that Ei(R,p), 

for the optimum choice of p, is never smaller than £ckm(R,Q)- For a given s, let Rq(D) be the 

rate-distortion function of X w.r.t. the distortion measure {d s (x, x')} subject to the constraint that 

Qx> = Q- Let Dp be the distortion level at which R'q{D) = —1/p, where Rq(-) is the derivative of 
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Rq{-). Also, Dq{R) will denote the corresponding distortion-rate function, which is the inverse of 
Rq(D). Then E\{R, p) admits the following expressions: 

F m n) _ / D p + P [R Q (D p ) -R] R< Rq(D p ) 

Ei{R,p) — | Dq(R) R>R Q (D p ) (37) 

As the straight line D p + p[Rq(D p ) — R] is tangential to (and below) the convex function Dq(R), 
the best choice of p is to take the limit p — > oo. But E\(R, oo) = Dq(R) for all R (as Rq(D oq ) = 0), 
which is in turn at least as large as -Eb(-R) = sn P p >i[E(p, s, Q) — pR] for all R, and strictly so in 
the linear part of the latter function. 

Thus, for a given s, there exists a sequence of codes for which the exponent of the maximum 
probability of error is dominated by sup p>1 [£?(/?, s, Q) — pR\. Upon maximization over s, this yields 
S(R,Q), as asserted in Theorem 1. 

5 Beyond Finite Alphabets and Fixed Composition Codes 

In Section 4, we have assumed finite alphabets and fixed composition codes, mainly for the simplicity 
of the exposition and for the purpose of comparison with the CKM expurgated exponent. However, 
as we have mentioned already, the analysis in Section 4 is not really sensitive to these assumptions. 

The heart of the analysis in Section 4 is around equations (31) and (32), and therefore, the main 
issue in the desired extension is to adapt this part of the analysis to continuous alphabets. Consider 
now the case where X = y = IR and then q{x) and p(y\x) are probability density functions. Let 5 
be an arbitrarily small positive real. Then, 

oo 

e -d s {x m ,x ml ) < J2e- nkS N m (k), (38) 

m'y^m k=0 

where 

N m (k) = l i nk6 < d s( x m,x m >) <n(k + l)5)} , A: = 0,1,2,... (39) 

Let us assume now that the ensemble of codes is defined such that d(x m , x m >) cannot exceed nZ? max , 
where D max < oo is a constant that does not depend on n, which is normally the case when the 
codewords must comply with input constraints. Then using a similar technique as in eq. (31), we 
now obtain 

A n (R,p) < sup (E{[N m (k)] 1 /"}Y ■ e~ nkS , (40) 
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where the notation < denotes inequality in the exponential scale (more formally, a n < b n means 
limsup n ^ 00 i In |^ < 0). The key issue is now to assess the exponential rate of the expectation of 
the binary random variable, 

U m > = 1 {nkS < d s (x m , X m >) < n(k + 1)5} , (41) 

for a given x m , namely, to find the exponent of Pr{nM < d s (x m , X m i) < n(k + 1)5} . This can be 
done using standard large deviations techniques, like the Chernoff bound. Let R(k5) denote the 
large deviations rate function of this probability (which depends, of course, on x m , but it would be 
convenient to define the ensemble such that this rate function will be the same for all m). Then, 
as in eq. (32), we then have 

WUN (hW 1 /P\ - / ex P{ n ^ " R ( kS )}} R < R ( kS ) (ao) 

*il"m{K)\ I - j exp { n[jR _ R(k5)]/p} R > R(k5) { > 

Now, similarly as in Section 4, A n (R, p) is dominated by mm{Ei(R, p, 5), E2(R, 5)}, where 



and 



E 1 (R,p,8)= inf [kS + pR(k5)\ - pR, (43) 

k: R(kS)>R 



E 2 {R,5)= inf [k5 + R{kS)\ - R, (44) 

k: R(kS)<R 



Upon taking the limit 5—^0, these become 



E^R, p) = inf [D + pR(D)} - pR, (45) 



and 



E ^ = 0:$ ) J D + R{D)] - R - m 

The remaining details depend, of course, on the form of the large deviations rate function R(D), 
which in turn depends strongly on the input assignment and the channel. 

Example 2 - the Gaussian channel. Consider the memoryless additive Gaussian channel Y = X+Z, 
where Z is a zero-mean Gaussian random variable with variance a 2 , independent of X. Let q{x) 
be the uniform distribution over the surface of the n-dimensional sphere with radius VnS. In 
this case, the Chernoff distance is maximized at s* = 1/2, where it agrees with the Bhattacharyya 
distance cIb{x,x') = (x — x') 2 /8a 2 . It is not difficult to show (e.g., using the methods of [11]) that 



R{D) = Uu 



S 



8a 2 D(l-2a 2 D/S) 



(47) 



14 



which has the interpretation of the rate-distortion function of the Gaussian source with variance S 
w.r.t. Bhattacharyya distortion measure with the additional constraint that reproduction variable 
X' is also Gaussian, zero-mean and with variance S. The corresponding distortion-rate function 
(which is the inverse of R(D)) is given by 

which is also the curvy part of the corresponding expurgated exponent. The linear part is again 
the tangential straight line with slope —1. 

6 The Statistical— Mechanical Perspective 

Let us take another look at the central expression that was handled in Sections 4 and 5, namely, 
on the summation 

Z= e- ds ^ Xm ' x m'\ (49) 

From the viewpoint of statistical physics, this can be interpreted as the partition function of a 
physical system, where for a fixed x m , the various configurations (microstates) are {aJ m '} m '^ m and 
the Hamiltonian (energy function) is given by (or proportional 5 to) d s (x m , x m >). If the correct 
codeword x m is given and the remaining codewords are considered independent and random, thus 
denoted {X m /}, then the various "configurational energies" {d s (x m , X m /)} are also independent 
random variables. As explained in [18, Chapters 5, 6] (see also [15, Chapters 6, 7] and references 
therein) , this setting is analogous to the random energy model (REM) in the literature of statistical 
physics of magnetic materials. The REM was invented by Derrida [4], [5], [6] as a model of extremely 
disordered spin glasses. This model is not realistic, but it is exactly solvable and it exhibits a phase 
transition: Below a certain critical temperature, the partition function becomes dominated by 
a sub-exponential number of configurations, which means that the system freezes in the sense 
that its entropy vanishes in the thermodynamic limit. This combination of freezing and quenched 
disorder resembles the behavior of a glass, and so, this low temperature phase of zero entropy is 
called the glassy phase.® Above the critical temperature, the partition function is dominated by an 



J To enhance the analogy with physics, it is instructive to consider a parametric family of channels, pp(y\x) oc 
[p(j/|a;)]' 9 , where /3 is a parameter that controls the 'quality' of the channel (e.g., the SNR in the case of the Gaussian 
channel), whose physical meaning is inverse temperature. In this case, d s (x m , x m >) of the channel pertaining to /3 — 1 
would be multiplied by /3, similarly as in ordinary partition functions. 

6 In physics, it typically occurs as a result of a process of rapid cooling. 
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exponential number of configurations, and so, its entropy is positive. This high temperature phase 
is called the paramagnetic phase. 

In the derivations of Sections 4 and 5, the curvy part of the graph of £ (R, Q) corresponds to the 
glassy phase of the REM associated with (49), because the dominant contribution to A n (R, p) is due 
to a subexponential number (N m (Qxx') or N m (k)) of codewords whose distance from x rn is about 
tiDq(R). The straight-line part of £(R,Q), on the other hand, corresponds to the paramagnetic 
phase, where about e n ^ R ~ Rl ^ incorrect codewords at distance uDq(Ri) dictate the behavior. Thus, 
the passage between the curvy part and the straight-line part, at R = R\ is interpreted as a glassy 
phase transition. 

In the Gallager expurgated bound, there is also a passage from a curvy part at low rates to a 
straight-line part at high rates. However, in Gallager's derivation, the passage happens due to a 
more technical reason. Since Gallager's analysis is based on the inequality [J2 m > a m'} l ^ p < E m ' a m /P > 
which holds only for p > 1, the maximization over p is a-priori limited to the range p > 1. The 
linear part of the curve is then generated due to the fact that for higher rates, the unconstrained 
achiever of E ex {R) is p* < 1, and so, the constrained one remains p* = 1, independently of R in 
this range. 
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